An Improved Policy Iteration Algorithm for Partially Observable MDPs
نویسنده
چکیده
A new policy iteration algorithm for partially observable Markov decision processes is presented that is simpler and more eecient than an earlier policy iteration algorithm of Sondik (1971,1978). The key simpliication is representation of a policy as a nite-state controller. This representation makes policy evaluation straightforward. The pa-per's contribution is to show that the dynamic-programming update used in the policy improvement step can be interpreted as the transformation of a nite-state controller into an improved nite-state controller. The new algorithm consistently outperforms value iteration as an approach to solving innnite-horizon problems.
منابع مشابه
Bounded Finite State Controllers
We describe a new approximation algorithm for solving partially observable MDPs. Our bounded policy iteration approach searches through the space of bounded-size, stochastic finite state controllers, combining several advantages of gradient ascent (efficiency, search through restricted controller space) and policy iteration (less vulnerability to local optima).
متن کاملProbabilistic Planning with Risk-Sensitive Criterion
Probabilistic planning models and, in particular, Markov Decision Processes (MDPs), Partially Observable Markov Decision Processes (POMDPs) and Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) have been extensively used by AI and Decision Theoretic communities for planning under uncertainty. Typically, the solvers for probabilistic planning models find policies that min...
متن کاملAn Improved Policy Iteratioll Algorithm
A new policy iteration algorithm for partially observable Markov decision processes is presented that is simpler and more efficient than an earlier policy iteration algorithm of Sondik (1971,1978). The key simplification is representation of a policy as a finite-state controller. This representation makes policy evaluation straightforward. The paper's contribution is to show that the dynamic-pr...
متن کاملMarkov decision processes with observation costs
A partially observable Markov decision process (POMDP) is a generalization of a Markov decision process in which observation of the process state can be imperfect and/or costly. Although it provides an elegant model for control and planning problems that include information-gathering actions, the best current algorithms for POMDPs are computationally infeasible for all but small problems. One a...
متن کاملLearning Policies in Partially Observable MDPs with Abstract Actions Using Value Iteration
While the use of abstraction and its benefit in terms of transferring learned information to new tasks has been studied extensively and successfully in MDPs, it has not been studied in the context of Partially Observable MDPs. This paper addresses the problem of transferring skills from previous experiences in POMDP models using high-level actions (options). It shows that the optimal value func...
متن کامل